Pokemon classification with a Support Vector Machine
BSHT Michielsen MSc
This notebook demonstrates how to use a Support Vector Machine (SVM) for image classification. Image recognition is the ability for the computer to identify an object in the image based on the visual characteristics of that object. This is a classification problem, where each possible object is a class, and the provided image should lead to 1 specific class with a as high as feasible certainty. In order to train a classification model with this, a large number of images of the same object are needed. Relative to this notebook there should be a folder named data in which several Pokemon images are found. These images are a subset of the Pokemon collection by Lance Zhang which were picked for the fact that the selected Pokemon have strikingly different colors and therefore the machine can hopefully distinguish them fairly well. More images for the same Pokemon or even different Pokemon can be downloaded and added to the data folder.
First, the versions of the required libraries are shown. It always wise to report the versions of the libraries used so that in case problems arise in the future, one can still go back to a state in which the notebook worked.
import copy, pathlib, math
import PIL.Image as Image
import sklearn
import numpy
import matplotlib
import matplotlib.pyplot as plt
print("scikit-learn version:", sklearn.__version__) # 1.1.3
print("numpy version:", numpy.__version__) # 1.23.4
print("matplotlib version:", matplotlib.__version__) # 3.6.2
scikit-learn version: 1.3.0 numpy version: 1.24.0 matplotlib version: 3.7.2
π¦ Data provisioningΒΆ
In real life the data provisioning phase is likely to include more steps about data sourcing and data quality, however for demo purposes in this notebook it is restricted to merely loading the images from the data folder, without any concern over quantity nor quality.
The code below will load the images and understand that the subfolder names are the class labels. It is important that all the images are the same size (and in this case square as well) so this code will automatically resize them. If high resolution images are available the size parameter can be increased and it will probably improve the performance slightly, at significantly increased training time. The given size of 256 is a middle way which is supposed to give fair results at a reasonable training time.
size = 256
def load_image(file, size):
img = Image.open(file)
img = img.resize((size, size))
return numpy.array(img).flatten()
def load_labelled_images(path, size):
labels = list()
files = list()
for file_info in [x for x in pathlib.Path(path).glob("**/*.jpg")]:
labels.append(file_info.parts[1])
files.append(str(file_info))
imgs = numpy.array([load_image(f, size) for f in files])
return imgs, numpy.array(labels)
images, labels = load_labelled_images("./pokemon-data", size)
print("Loaded", len(images), "images in the following", len(numpy.unique(labels)), "classes:")
for label in numpy.unique(labels):
print(label)
Loaded 251 images in the following 10 classes: Bulbasaur Charizard Charmander Electrode Jolteon Mewtwo Pikachu Squirtle Zubat chikorita
π Sample the dataΒΆ
To get an impression of the data, here a sample from the loaded images is plotted so see if they we loaded correctly. The parameter sample_size can be increased if more images should be shown.
sample_size = 24
plotimgs = copy.deepcopy(images)
numpy.random.shuffle(plotimgs)
rows = plotimgs[:sample_size]
_, subplots = plt.subplots(nrows = math.ceil(sample_size/8), ncols = 8, figsize=(18, int(sample_size/3)))
subplots = subplots.flatten()
for i, x in enumerate(rows):
subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
subplots[i].set_xticks([])
subplots[i].set_yticks([])
π οΈ PreprocessingΒΆ
Given that this case uses images, there is no such thing as feature selection because one cannot select some pixels to be better indicators than other pixels beforehand. Therefore, there is little to do in terms of preprocessing other than splitting the dataset into a trainset and testset.
πͺ Splitting into train/test
A split of 70%/30% is chosen here in order to have a fairly large number of testing images.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=.3, random_state=0)
𧬠Modelling¢
In this step the model will be fitted with the trainset only. In this case a Support Vector Machine for classification.
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
Accuracy: 0.6973684210526315
π¬ EvaluationΒΆ
Below a classification report is printed. This shows for every one of the classes how well the model performed.
from sklearn.metrics import classification_report
predictions = model.predict(X_test)
report = classification_report(y_test, predictions)
print(report)
precision recall f1-score support
Bulbasaur 0.62 0.71 0.67 7
Charizard 0.00 0.00 0.00 1
Charmander 0.33 0.60 0.43 5
Electrode 0.94 0.89 0.91 18
Jolteon 0.00 0.00 0.00 4
Mewtwo 0.35 0.75 0.48 8
Pikachu 0.92 0.92 0.92 12
Squirtle 1.00 0.29 0.44 7
Zubat 0.91 0.91 0.91 11
chikorita 0.00 0.00 0.00 3
accuracy 0.70 76
macro avg 0.51 0.51 0.48 76
weighted avg 0.71 0.70 0.67 76
c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
It appears that Mewtwo is fairly hard to recognize, but the others all seem well. The code below will plot every pokemon in the testset, including the predicted label as well as whether this was correct or wrong.
_, subplots = plt.subplots(nrows = math.ceil(len(X_test)/4), ncols = 4, figsize=(15, len(X_test)))
subplots = subplots.flatten()
for i, x in enumerate(X_test):
subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
subplots[i].set_xticks([])
subplots[i].set_yticks([])
subplots[i].set_title(predictions[i] + (" (correct)" if predictions[i] == y_test[i] else " (wrong)"))
Even a relatively simple Support Vector Machine with just minutes of training time can do reasonably well at image recognition. Surely a deep learning CNN would perhaps do even better, but also at largely increased need for training resources and time. Probably, when the number of Pokemon increases and others with similar colours will be added this model's quality is likely to decrease quite rappidly, but then maybe also the quality of the images should be improved to help the machine. For example, the current images are of rather poor resolution and some even have significant background noise. Having cleaner, high quality, high resolution images may improve the general outcome.
0.5ΒΆ
from sklearn.svm import SVC
model = SVC(C = 0.5)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
Accuracy: 0.4473684210526316
1.0ΒΆ
from sklearn.svm import SVC
model = SVC(C = 1.0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
Accuracy: 0.6973684210526315
2.0ΒΆ
from sklearn.svm import SVC
model = SVC(C = 2.0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
Accuracy: 0.75
ExplanationΒΆ
C controls the margin an thereby is responsible for overfitting/oversmoothing. The higher the value of C, the lower the margin.
Hyperparameter kernelΒΆ
linearΒΆ
This will be suitable for linearly separable data. The classes can be separated by a straight line/hyperplane. It's useful when the data is linear.
polyΒΆ
Poly is short for polynomial. This means that this kernel is suitable for polynomial relationships. It can see the higher-order interactions between features.
rbfΒΆ
rbf is short for Radial Basis Function. This kernel is the default because it is versatile. It works well for data with complex relationships. Often, it's used when you are unsure about the data's relationships.
sigmoidΒΆ
The sigmoid kernel is used for data that follows a sigmoid pattern, which is an S-shaped pattern.
model = SVC(kernel="linear")
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy with linear:", score)
model = SVC(kernel="poly")
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy with poly:", score)
model = SVC(kernel="rbf")
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy with rbf:", score)
model = SVC(kernel="sigmoid")
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy with sigmoid:", score)
Accuracy with linear: 0.7894736842105263 Accuracy with poly: 0.7368421052631579 Accuracy with rbf: 0.6973684210526315 Accuracy with sigmoid: 0.2236842105263158
FindingsΒΆ
The accuracy of sigmoid is very low, so my conclusion would be that the data does not follow a sigmoid pattern. Also, I would think that poly would be low if linear is high and visa versa.
Moaaahhhh PokemonΒΆ
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=.3, random_state=0)
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
from sklearn.metrics import classification_report
predictions = model.predict(X_test)
report = classification_report(y_test, predictions)
print(report)
Accuracy: 0.6973684210526315
precision recall f1-score support
Bulbasaur 0.62 0.71 0.67 7
Charizard 0.00 0.00 0.00 1
Charmander 0.33 0.60 0.43 5
Electrode 0.94 0.89 0.91 18
Jolteon 0.00 0.00 0.00 4
Mewtwo 0.35 0.75 0.48 8
Pikachu 0.92 0.92 0.92 12
Squirtle 1.00 0.29 0.44 7
Zubat 0.91 0.91 0.91 11
chikorita 0.00 0.00 0.00 3
accuracy 0.70 76
macro avg 0.51 0.51 0.48 76
weighted avg 0.71 0.70 0.67 76
c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) c:\Users\arthu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
ConclusionΒΆ
My conclusion is that my sample sizes are probably too small compared to the given data. That is probably the accuracy is lower.
Your own imagesΒΆ
size = 256
def load_image(file, size):
img = Image.open(file)
img = img.resize((size, size))
return numpy.array(img).flatten()
def load_labelled_images(path, size):
labels = list()
files = list()
for file_info in [x for x in pathlib.Path(path).glob("**/*.jpg")]:
labels.append(file_info.parts[1])
files.append(str(file_info))
imgs = numpy.array([load_image(f, size) for f in files])
return imgs, numpy.array(labels)
images, labels = load_labelled_images("./clash_royale-data", size)
print("Loaded", len(images), "images in the following", len(numpy.unique(labels)), "classes:")
for label in numpy.unique(labels):
print(label)
Loaded 50 images in the following 2 classes: Goblins Hogrider_arto
sample_size = 24
plotimgs = copy.deepcopy(images)
numpy.random.shuffle(plotimgs)
rows = plotimgs[:sample_size]
_, subplots = plt.subplots(nrows = math.ceil(sample_size/8), ncols = 8, figsize=(18, int(sample_size/3)))
subplots = subplots.flatten()
for i, x in enumerate(rows):
subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
subplots[i].set_xticks([])
subplots[i].set_yticks([])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=.3, random_state=0)
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)
Accuracy: 0.7333333333333333
from sklearn.metrics import classification_report
predictions = model.predict(X_test)
report = classification_report(y_test, predictions)
print(report)
precision recall f1-score support
Goblins 0.62 0.83 0.71 6
Hogrider_arto 0.86 0.67 0.75 9
accuracy 0.73 15
macro avg 0.74 0.75 0.73 15
weighted avg 0.76 0.73 0.74 15
_, subplots = plt.subplots(nrows = math.ceil(len(X_test)/4), ncols = 4, figsize=(15, len(X_test)))
subplots = subplots.flatten()
for i, x in enumerate(X_test):
subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
subplots[i].set_xticks([])
subplots[i].set_yticks([])
subplots[i].set_title(predictions[i] + (" (correct)" if predictions[i] == y_test[i] else " (wrong)"))
ConclusionΒΆ
I did this exercise together with Matthijs Dolmans. 73% accuracy is pretty good in my opinion. However, in this case where we only have two classes that have different colors, I think it is not great. This could be due to us using the default kernel, while another might excel.